Welcome to Data Science Computing!

Session 01: Introduction

Danilo Freire

Department of Quantitative Theory and Methods
Emory University

August 11, 2024

Welcome to QTM350 - Data Science Computing! 🥳 🎉

Lecture Overview

  • Introductions
  • Motivation
  • Class Logistics
  • Set up

Course Materials


Course website: https://danilofreire.github.io/qtm350


Course repository: https://github.com/danilofreire/qtm350


Much of this course lives on GitHub. You will find lecture materials, code, assignments, and other people’s presentations there. We will also use Canvas, which is for everything else.

Nice to meet you!

Instructor

A bit about me

Visiting Assistant Professor in the QTM

MA from the Graduate Institute Geneva, PhD from King’s College London, Postdoc at Brown University, Senior Lecturer at the University of Lincoln, UK.

Research interests: policy evaluation, political violence, organised crime, computational social science, and experimental methods.

Things I really like

Things I really like

Things I really like

Things I really like

Things I really like!

Office hours: What for and what not for

  • What these sessions are meant for:
    • Applying tools in practice
    • Discussion of issues related to the assignments
    • Boosting your knowledge of data science
  • What these sessions are not meant for:
    • Solving the assignments for you
    • Taking care of developing your coding skills

Class etiquette

  • Coding can be tough and push you out of your comfort zone. If the course pace is too fast, let us know. I expect your commitment, but I do not want anyone to fail.
  • You are all keen on data science, but your backgrounds vary. That is great! Some sessions might be more engaging than others. If you are bored, help others or explore new data science areas.
  • Always be respectful to each other.
  • Ask questions whenever you need to!

Motivation: What is data science?

An old classic

An old classic

An old classic

An old classic

An old classic

Rise of the digital information age

Social media data

New data formats

Survey data

Cheap computing power

As a consequence:

  • Abundance of data available for research and for governments to make better decisions

    • Opportunities for novel research questions

    • New methods to answer longstanding research questions

  • New technologies also have social implications and can raise important policy issues

    • Ethical concerns

    • Use of technology by malicious actors

    • Government use of technology to censor or monitor citizens

Course Overview and Logistics

Logistics

  • Syllabus: Available on the course website. Our course is centred around the data science workflow: the terminal, version control, academic/industry publishing, data storage, data management, data visualisation, and containerisation

  • Schedule: Lectures are on Tuesdays and Thursdays from 9:00 to 10:15 am. Labs are on Fridays from 9:00 to 10:15 am

  • Office Hours: I’m available to meet you at any time. And I mean it! Please reach out a couple of days in advance and we can schedule a meeting

  • Materials:

Assignments

  • Weekly assignments: Due on Fridays at 11:59 pm
  • Final project: Due on the last day of class
  • Grading: 50% assignments, 50% final project
  • Late policy: 10% off per day late
  • Collaboration: You can discuss assignments with your classmates, but you must write your own code and submit your own work
  • Feedback: I will provide feedback on your assignments and final project. I will also provide feedback on your code and data management practices
  • Academic integrity: Please refer to the syllabus for the university’s policy on academic integrity

Set up

Software

  • Git: Version control system. Download it here. Instructions for installation here. Feel free to configure it if you wish (instructions here), but we are going to talk about it in class.

  • GitHub: Online platform for hosting code repositories. You will use it a lot, and not only for this class. Create an account on GitHub and register for a student/educator discount. You will soon receive an invitation to the course organisation on GitHub, as well as GitHub classroom, which is how we’ll disseminate and submit assignments, receive feedback and grading, etc.

OS extras

Next class

  • We will start with the basics of the terminal and version control
  • Please make sure that you have a terminal installed on your computer
  • If you have any questions, please reach out to me
  • Recommended readings for the next class are available on the course website: https://danilofreire.github.io/qtm350-2024/

and that’s it for today! 😊

Questions?

Thank you very much for your attention! Have a great day! 😊